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Abstract 

The effect of different Monte Carlo move sets on the the folding kinetics 
of lattice polymer chains is studied from the geometry of the conformation- 
network. The networks have the characteristics of small- world. The 
Monte Carlo move, rigid rotation, has drastic effect on the geometric 
properties of the network. The move not only change the connections but 
also reduce greatly the shortest path length between conformations. The 
networks are as robust as random network. 

Protein folding is a complex process for which, a sequence of amino acids 
folds into a unique and stable structure in a relatively short time 1 . The lattice 
models have been used widely as coarse-grained models for the theoretical study 
of folding process0 El El EH HI- I n the lattice models, protein is viewed 
as a chain of m monomers, and the conformations are given by all possible 
self-avoiding walks of the chain on two-dimensional (2D) or three-dimensional 
(3D) regular lattices. The energy of a conformation generally depends on the 
number of intrachain contacts, and how to assign the contact energy is model 
dependent. The kinetics of the folding process then is studied by Monte Carlo 
simulations for which, a move set is designed for the change of conformations. 
In principle, different move sets, satisfying the requirement of ergodicity, should 
reach the same equilibrium canonical distribution after sufficiently long time 
simulations. However, different move sets may yield different perspectives of 
folding kinetics. The question of how different move sets affect folding kinetics 
was discussed before by Chan and Dill^E!- Based on two different sets, they 
used transfer matrix with Metropolis criterion to study the folding kinetics 
of two-dimensional homo- and hetero-polymers. The results indicate that the 
kinetic sequence of folding events and the shape of the energy landscape depend 
strongly on the move set. Hoang and Cieplak also obtained the same conclusions 
after comparing the dynamics of three different move sets|S]. Therefore, it 
is important to understand the nature of a move set adopted in the lattice 
dynamics. 



The purpose of this Letter is to explore the characteristics of different move 
sets from the geometric properties of the corresponding conformation- networks 9 
I1U| . We study the conformation spaces of the homopolymers with m < 16 on 
the 2D square lattice for different move sets. Though the chain lengths consid- 
ered in this work are relatively short, we can construct the networks by exact 
enumeration. For the conformation-networks obtained by different move sets, 
firstly, what are the characteristic features of the networks? It was shown by 
Scala et. al.^H] that the geometric properties of the conformation- network ob- 
tained from the mapping of a particular conformation space are similar to those 
of small- world networks. A small- world network is characterized by two proper- 
ties: the local connection is as cliquy as regular lattices, and the characteristic 
path length increases logarithmically with the number of nodes^]^]. Do the 
conformation-networks obtained from different move sets all have the charac- 
teristics of small-world networks? For this, we analyze the characteristic path 
lengths and clustering coefficients of the networks. Then, what are the differ- 
ences in the geometric properties of the networks? This leads us to the analyses 
of the distribution functions of the edge number per node and the shortest path 
lengths between two nodes. Finally, we also discuss the stability of the networks. 

For the dynamical simulations of lattice polymers, the typical Monte Carlo 
moves include (i) end flips, (ii) corner shift, (m)crankshaft move, and (iv) rigid 
rotation, as shown in Fig. 1. Based on these moves, we consider three different 
move sets, SA, SB, and SC, which are defined as the followings. The moves, (i), 
(ii), and (Hi), which change only one or two monomers in a move, are relatively 
local in comparison with the move (iv). Because of the locality, these moves 
have been adopted very often in the literatures [7| fTl j . We refer these moves 
as SA. Note that SA may not satisfy the requirement of ergodicity: In the 
case of 2D square lattice, all exactly enumerated conformations are reachable 
for m < 16, there is one unreachable for m = 16, and the unreachable number 
may become large for m > 160 The move (iv) contains the change 
of more monomers, and it makes some simple diffusive motions for groups of 
monomers to be possible |T5] . Since the move (i) can be viewed as a short-scale 
rigid rotations, we then combine the move (iv) with the move (i) to form the set 
of rotational moves, SB. Note that the move (ii) or (Hi) from a conformation 
can be achieved by two or more sequential moves of SB, and all unreachable 
conformations for SA can be obtained by the moves of SB. Thus, SA may be 
viewed as a subset of SB. Finally, we refer the move set containing all the moves 
as SC. 

Based on the move sets SA, SB, and SC, we construct the respective 
conformation-networks as the followings: First, we enumerate all possible self- 
avoiding conformations N (m) as the nodes of the network for the chain of m 
monomers. Note that the degeneracy caused by the rotation and the mirror 
symmetry has been excluded in N(m). Then, edges exist between two nodes 
for which a move of the given move set can change one to the other. The move 
sets, SA, SB, and SC, yield different distributions of edges among the nodes 
and hence different networks. The networks can be viewed as the folding net- 
works in high temperature limit, all edges have the same weight. We refer the 
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resultant networks as GA, GB, and GC for the move sets, SA, SB, and SC, 
respectively. The numbers of nodes N (m) and the numbers of edges E (m) of 
different networks for different number of monomers m are listed in Table 1. 

Firstly, we analyze the edge distributions of the networks. The number of 
edges associated with a node is the number of allowed transitions from one 
conformation to the others. The spread in the number of edges is characterized 
by a distribution function P (k) which gives the probability for a node to have 
k edges. Then, the average edge-number per node is given by 

(fc}=5>P(fc). (1) 

k 

The results of (k) are listed in Table 1, and they all scale as (k) = a + 
blog(N (to)), shown in the insets of Fig. 2, with (a, b) given as (3.79,0.92) 
for GA, (3.07,2.99) for GB, and (2.77,4.01) for GC. Thus, the average edge- 
number per node generated by the move set SB (SC) is about three (four) 
times the average number generated by SA. This gives more throughway ac- 
cessibility to the native conformation and reduces the chance to be trapped in 
local minimum in the folding process for the move sets SB and SCjHllE]. 

Our results of P (k) versus A/c = k— (k) for GA, GB, and GC with m = 10, 
12, 14, and 16 are shown in Figs. 2(a), 2(b), and 2(c), respectively. Amaral 
et. al. studied the subnetwork of GA for which, the end-to-end distance is a 
parameter with a specified value for a network and the edges between nodes are 
generated by the moves, corner shift and crankshaft move|5| IT71]. Their results 
showed that the form of P (k) is Gaussian. Then, we find the best fittings of 
the Gaussian function, 



P (k) = , exp 



(k-(k)) 2 
2a 2 



(2) 



for GA, GB, and GC, as the solid lines in Fig. 2. Our results indicate the 
followings: (i) For GA, the distribution agrees with the Gaussian form for 
which, the average of the variances of different m is <tga = (0.5748) \/N m . 
(ii) Comparing with the case of GA, there are significant deviations from the 
Gaussian form for the cases of GB and GC as shown in Figs. 2(b) and 2(c), 
but the distributions are obviously not scale- free jlfil 1171 HH] , 

The degree of local connections of the networks can be measured by the 
clustering coefficients. We define the clustering coefficient of the node i as 

where ki is the edge-number and E^ is the existent edge-number among the 
neighboring fc, nodes of the node i. Then, the degree of local connections of the 
network can be characterized by the average of the clustering coefficients of the 
nodes, denoted by C av . The values of C av for GA, GB, and GC with different 
to values are listed in Table 1. The results show > C^,? > C^ c , and this 
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implies that the Monte Carlo simulation with the move set SA has more chances 
to be trapped in some clinquy conformations comparing with those of SB and 
SC. For the network with the node- number N and the average edge-number 
(k), the corresponding random network has the average clustering coefficient 
CZ n w (k) /N. The results of the ratios of C av of GA, GB, and GC to C™" 
are shown in Fig. 3, and they indicate that the average clustering coefficients 
of the conformation-networks are much larger than that of random network. 

Then, we analyze the path lengths between two nodes of the network. The 
minimum number of Monte Carlo moves required for one conformation to reach 
the other can be viewed as the distance between two conformations ISJIS]. Thus, 
the shortest path length I between two nodes can be defined as the minimum 
number of edges required to connect the two nodes. Our results indicate that the 
distribution functions of the shortest path lengths, P(l), of different networks 
all agree with the Gaussian form of Eq. (2) . In Fig. 4 we plot the scaled results 
of the distribution function, P (I) = \/2naP (I), versus Al = (I — (I)) / \/2<j for 
GA, GB, and GC, with m = 10, 12, 14, and 16. Here, we take the variances a 
as ctga = 0.0489 (m) 17 , ct gb = 0.3057 (m) 6 , and a GC = 0.5057 (m) 6 , which 
arc determined by first finding the least square fit to Eq. (2) to obtain a (m) 
for a given m, and then taking the average over a (m) of different m to obtain 
a for a given network. The variance of P (I) for GB is much smaller than that 
for GA, and this implies that the shortest distance between any two nodes does 
not vary much for the networks GB and GC. 

The characteristic path length of the network, (I), is defined as the average 
of the shortest path lengths for all node-pairs, 



The values of (/) for GA, GB, and GC with different m values are listed in 
Table 1. The results indicate that the characteristic path length of GB is 
about half of the length of GA. For the small-world networks, there exists a 
cross-over size N* , which is about the same order as the inverse of the rewiring 
probability p,such that the characteristic path lengths (I) obey the finite-size 
scaling low[inill31 ED, 



where d is the dimensionality of the underlie regular lattice, and / (x) is a 
scaling function with the limits, f (x) ~ x x l d for x <C 1 and / (x) ~ lnx for 
x ^> 1. By taking the hypothesis that the conformation network may be a 
small-world network, we use the scaling form of Eq. (5) to fit the data, and 
the results are shown in Fig. 5. The fittings indicate that (i) the values of (I) 
increase logarithmically with the node-number N for large N; (ii) we obtain 
1/d from the fittings of small N as 0.3427, 0.2377, and 0.2155 for the networks 
GA, GB, and GC respectively, and then our estimations of d are dcA ~ 3, 
dcB ~ 4, and dec ~ 4.5; and (Hi) the cross-over region N* (m) is around 
m = 9 ~ 11 (p ~ 10~ 3 - 10~ 4 ) for GA and 8 (p ~ 10~ 3 ) for GB and GC. 




(4) 




(5) 
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Note that we do not take the data from m < 4 for which the node-number TV 
is less than 5, and thence the data points available for the region of small N 
are few. Based on the above results, we may conclude that the dimensionality 
of the conformation-space is d > 3, and the cross-over region become narrower 
when the dimensionality gets larger. 

Finally, we analyze the ability of attack and error tolerance of the network 
by studying the fragmentation caused by node-removal 22|. The nodes with 
higher degrees of connections are removed preferentially for the analysis of attack 
tolerance; and the nodes are removed randomly for the error tolerance. By 
removing a fraction / of the nodes, we measure the fraction of nodes contained 
in the largest cluster, S, and the average node number, (s), contained in the 
fragmentary clusters excluding the largest one. If only the removed nodes were 
missing from without further breaking the largest cluster, the S value decreases 
from 1 down to along the diagonal line as / increases from up to 1; and the 
(s) value remains to be one for < / < 1 if the removed nodes were isolated 
from each other. For most networks, we may expect that while as the S values 
start to decrease more rapidly than the diagonal line at some fraction / m , and 
drop to zero at the critical fraction f c ; the (s) value start to increase more 
rapidly from (s) = 1 at / m , and reach the maximum at f c . The results of S and 
(s) as function of / are shown in Fig. 6 for the networks GA, GB, and GC 
with m = 16. Our results show that the f c value is very closed to 1, and the 
stability of the networks is very analogous to random networks. 

In summary, we divide the frequently used Monte Carlo moves into three dif- 
ferent move sets, and construct the corresponding conformation- networks. The 
networks all have the characteristics of small- world: (i) the local neighborhood 
is more cliquy than that of random networks, and (ii) the characteristic path 
length increases logarithmically with the number of nodes. The dimensionali- 
ties of the conformation-spaces are d > 3. Our analyses also indicate that the 
networks are as robust as random graphs. Among different Monte Carlo moves, 
the rigid rotation has drastic effect on the geometric properties of the network: 
(i) it renders the connection distribution to be non-Gaussian, and (ii) it reduces 
greatly the characteristic path length. Thus, the Monte Carlo move, rigid rota- 
tion, may change the folding kinetics significantly from that of the local moves, 
corner shift and crankshaft move. 
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Figure 1: Examples of typical Monte Carlo moves: (a) end flips, (b) corner shift, 
(c)crankshaft move, and (d) rigid rotation. The current conformation is shown 
in thick lines, and possible new conformations are shown in broken lines. 



Table 1: Various geometric quantities of the conformation-networks GA, GB, 
and GC with different number of monomers to: the numbers of nodes N, the 
numbers of edges E, the average edge number per node (k), and the character- 
istic path length (I). 



TO 


10 


12 


14 


16 


N 


2034 


15037 


110188 


802075 


E GA 


6966 


57451 


464687 


3702485 


E GB 


13194 


117839 


1005304 


8314161 


E GC 


16397 


147673 


1268544 


10554679 


(kf A 


6.8496 


7.6413 


8.4344 


9.2323 


(kf B 


12.9735 


15.6732 


18.2471 


20.7316 


(kf C 


16.1229 


19.6413 


23.0251 


26.3184 


df A 


7.6369 


11.0731 


15.0046 


19.4403 


df B 


4.5953 


5.8286 


7.0726 


8.3236 


(lf C 


3.9555 


4.9611 


5.9723 


6.9869 
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Figure 2: The probability distribution of edges, P(k), versus Afc = k — (k) for 
the networks (a)GA, (6)GB, and (c)GC. Here, (k) is the average edge number 
per node, and the solid lines are the best fittings of the Gaussian function given 
in the text. For each network, (k) versus log (N (m)) for the node number N (to) 
with the monomer number to ranged from 8 to 16 is shown in the inset, and 
the straight solid line corresponds to the relation (k) = a + 6 log (TV (m)) with 
the values of a and b given in the text. 
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Figure 3: The ratios of the average clustering coefficients, C av , of the networks 
GA, GB, and GC to the average clustering coefficients of random lattices C™ n 
versus log (N) with the node number TV (to) and the monomer number m ranged 
from 8 to 16. 
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Figure 4: The scaled result of the distribution function of the shortest path 
lengths P(l), P(l) = y/zHaP(l), versus Al = (I — (I)) /V2a for (a) GA, (b) 
GB, and (c)GC, with m = 10, 12, 14, and 16. The averages of the shortest 
path lengths for all node-pairs (/) are given in Table 1, and the variances a are 
o-ga = 0.0489 (m) 17 , a GB = 0.3057 (m) 6 , and a GC = 0.5057 (m) 6 . The 
solid lines are the results of the Gaussian function given in the text. 
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Figure 5: The characteristic path length (I) versus the logarithm of the node- 
number N, log(N), for the networks (a) GA, (6) GB, and (c) GC with the 
monomer number m ranged from 5 to 16. The insets are the plots of log((Z)) 
versus log (N) for the same data. The solid lines are the results of the limiting 
scaling forms given in the text. 
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Figure 6: The fraction of nodes contained in the largest cluster, S, and the 
average node number, (s), contained in the fragmentary clusters excluding the 
largest one versus the fraction / of the nodes removed for (a) attack and (b) 
error tolerance of the networks GA, GB, and GC with m — 16. 
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